Hello, Sometimes when creating hard links to the rsync destination directory, it seems like the new directory (created from the cp -al command) ends up with all the data. This causes a problem in the sense that if the rsync destination directory had 21GB, after the cp -al command, it ends up having only 8mb, then the rsync source directory determines that it now requires 21.98GB to update the destination directory. Here is an example of a test that I was doing. I have no idea why sometimes it works like it should, and sometimes it doesn't. My destination directory is called 'Latest'. [root@backup backup]# du --max-depth=1 -h 21G ./Latest 21G . [root@backup backup]# cp -al Latest/ ktest/ [root@backup backup]# du --max-depth=1 -h 21G ./Latest 8.7M ./ktest 21G . [root@backup backup]# rm ktest/ -rf [root@backup backup]# cp -al Latest/ mtest/ [root@backup backup]# du --max-depth=1 -h 21G ./Latest 8.7M ./mtest 21G . [root@backup backup]# rm mtest/ -rf [root@backup backup]# cp -al Latest/ test/ [root@backup backup]# du --max-depth=1 -h 21G ./test 8.3M ./Latest 21G . The last instance is the problem that happens quite often. Now when I perform an rsync as such: rsync /share/ /backup/Latest --stats --recursive --archive --times --modify-window=1 --delete --ignore-errors --no-whole-file --files-from=/var/www/html/new/var/backup_selections.txt --exclude-from=/var/www/html/new/var/file-exclude --progress I get the following results: Number of files: 53911 Number of files transferred: 52223 Total file size: 21654476720 bytes Total transferred file size: 21654476720 bytes Literal data: 21651840443 bytes Matched data: 0 bytes File list size: 992872 Total bytes sent: 21657710607 Total bytes received: 1044480 And a du gives me: [root@backup backup]# du --max-depth=1 -h 21G ./test 21G ./Latest 41G . It appears that due to the cp -al command not working right as stated above, the literal changes needed was everything minus the 8.3mb, when in reality there were very few changes between 'Share' and 'Latest'. Can someone give any guidance on this issue? There are time when this will happen several times throughout the 30 day incremental routine so the disk requirements are very large. How can I keep all the data in 'Latest' consistently after using the cp -al command? Thanks, Max
On Thu 11 May 2006, Max Kipness wrote:> [root@backup backup]# cp -al Latest/ mtest/ > [root@backup backup]# du --max-depth=1 -h > 21G ./Latest > 8.7M ./mtest > 21G . > [root@backup backup]# rm mtest/ -rf > [root@backup backup]# cp -al Latest/ test/ > [root@backup backup]# du --max-depth=1 -h > 21G ./test > 8.3M ./Latest > 21G . > > The last instance is the problem that happens quite often. Now when INo, it's not a problem. It's just that now du encounters the "test" directory before finding the "Latest" directory. du only counts the blocks of hardlinked files once, and reports the size under the first directory such a file is in. The 8.3M (or 8.7M) is purely the disk blocks needed for the directories and any symlinks if applicable, it is *not* related to storage of file contents. If you run "du -s -h test Latest" (or use the --count-links option) you will see that each directory is handled separately, and both will have 21G.> perform an rsync as such: > > rsync /share/ /backup/Latest --stats --recursive --archive --times > --modify-window=1 --delete --ignore-errors --no-whole-file > --files-from=/var/www/html/new/var/backup_selections.txt > --exclude-from=/var/www/html/new/var/file-exclude --progress > > I get the following results: > > Number of files: 53911 > Number of files transferred: 52223 > Total file size: 21654476720 bytes > Total transferred file size: 21654476720 bytes > Literal data: 21651840443 bytes > Matched data: 0 bytes > File list size: 992872 > Total bytes sent: 21657710607 > Total bytes received: 1044480 > > And a du gives me: > > [root@backup backup]# du --max-depth=1 -h > 21G ./test > 21G ./Latest > 41G . > > It appears that due to the cp -al command not working right as stated > above, the literal changes needed was everything minus the 8.3mb, when > in reality there were very few changes between 'Share' and 'Latest'.What's happened is that the files are updated, and the hard link is lost. Why the files are updated I can't say, it could be due to all sorts of reasons; perhaps using the --itemize-changes option will help. Look into the --link-dest option, you can leave out your cp -al pass in that case.> Can someone give any guidance on this issue? There are time when this > will happen several times throughout the 30 day incremental routine so > the disk requirements are very large. How can I keep all the data in > 'Latest' consistently after using the cp -al command?May I suggest the dirvish package, which is a sort of wrapper around rsync to implement incremental changes? It sounds like what you're trying to do. http://www.dirvish.org/ Paul Slootman
> On Thu 11 May 2006, Max Kipness wrote: > > > [root@backup backup]# cp -al Latest/ mtest/ > > [root@backup backup]# du --max-depth=1 -h > > 21G ./Latest > > 8.7M ./mtest > > 21G . > > [root@backup backup]# rm mtest/ -rf > > [root@backup backup]# cp -al Latest/ test/ > > [root@backup backup]# du --max-depth=1 -h > > 21G ./test > > 8.3M ./Latest > > 21G . > > > > The last instance is the problem that happens quite often. Now whenI> > No, it's not a problem. It's just that now du encounters the "test" > directory before finding the "Latest" directory. du only counts the > blocks of hardlinked files once, and reports the size under the first > directory such a file is in. > > The 8.3M (or 8.7M) is purely the disk blocks needed for thedirectories> and any symlinks if applicable, it is *not* related to storage of file > contents. > > If you run "du -s -h test Latest" (or use the --count-links option)you> will see that each directory is handled separately, and both will have > 21G. > > > perform an rsync as such: > > > > rsync /share/ /backup/Latest --stats --recursive --archive --times > > --modify-window=1 --delete --ignore-errors --no-whole-file > > --files-from=/var/www/html/new/var/backup_selections.txt > > --exclude-from=/var/www/html/new/var/file-exclude --progress > > > > I get the following results: > > > > Number of files: 53911 > > Number of files transferred: 52223 > > Total file size: 21654476720 bytes > > Total transferred file size: 21654476720 bytes > > Literal data: 21651840443 bytes > > Matched data: 0 bytes > > File list size: 992872 > > Total bytes sent: 21657710607 > > Total bytes received: 1044480 > > > > And a du gives me: > > > > [root@backup backup]# du --max-depth=1 -h > > 21G ./test > > 21G ./Latest > > 41G . > > > > It appears that due to the cp -al command not working right asstated> > above, the literal changes needed was everything minus the 8.3mb,when> > in reality there were very few changes between 'Share' and 'Latest'. > > What's happened is that the files are updated, and the hard link is > lost. Why the files are updated I can't say, it could be due to all > sorts of reasons; perhaps using the --itemize-changes option willhelp.> > Look into the --link-dest option, you can leave out your cp -al passin> that case.Thanks for the info. I think I understand better how the hard linking works now. I still can't seem to figure out why the hard links are breaking though. And now I've noticed that I have similar issues on other server with hard links (evidently). In one instance, the server has maybe 600mb or so of changes per day, and a total of about 19GB total files, yet each incremental directory shows 5gb or so when doing a du. So does that mean that there are hard links breaking daily? du --max-depth=1 -h /backup 18G /backup/05-02-2006 5.1G /backup/04-26-2006 5.4G /backup/05-05-2006 5.0G /backup/04-23-2006 5.1G /backup/04-29-2006 5.1G /backup/04-27-2006 5.0G /backup/04-17-2006 5.0G /backup/04-13-2006 5.4G /backup/05-08-2006 3.8G /backup/05-06-2006 5.0G /backup/04-20-2006 3.9G /backup/05-07-2006 5.9G /backup/Current 5.1G /backup/05-03-2006 3.7G /backup/05-01-2006 5.0G /backup/04-21-2006 5.0G /backup/04-19-2006 5.0G /backup/04-25-2006 5.0G /backup/04-14-2006 3.6G /backup/04-15-2006 5.0G /backup/04-24-2006 3.7G /backup/04-28-2006 3.7G /backup/04-30-2006 3.9G /backup/05-09-2006 3.9G /backup/05-11-2006 3.6G /backup/04-16-2006 5.0G /backup/04-18-2006 3.9G /backup/05-10-2006 5.1G /backup/05-04-2006 146G /backup [root@backup reports]# du -sh /backup/05-02-2006/ /backup/04-26-2006/ /backup/05-05-2006/ /backup/04-23-2006 /backup/04-29-2006 /backup/04-27-2006 /backup/04-17-2006 /backup/04-13-2006 /backup/05-08-2006 /backup/05-06-2006 /backup/04-20-2006 /backup/05-07-2006 /backup/Current /backup/05-03-2006 /backup/05-01-2006 /backup/04-21-2006 /backup/04-19-2006 /backup/04-25-2006 /backup/04-14-2006 /backup/04-15-2006 /backup/04-24-2006 /backup/04-24-2006 /backup/04-30-2006 /backup/05-09-2006 /backup/05-11-2006 /backup/04-16-2006 /backup/04-18-2006 /backup/05-10-2006 /backup/05-04-2006 18G /backup/05-02-2006/ 18G /backup/04-26-2006/ 19G /backup/05-05-2006/ 18G /backup/04-23-2006 18G /backup/04-29-2006 18G /backup/04-27-2006 18G /backup/04-17-2006 18G /backup/04-13-2006 19G /backup/05-08-2006 19G /backup/05-06-2006 18G /backup/04-20-2006 19G /backup/05-07-2006 19G /backup/Current 18G /backup/05-03-2006 18G /backup/05-01-2006 18G /backup/04-21-2006 18G /backup/04-19-2006 18G /backup/04-25-2006 18G /backup/04-14-2006 18G /backup/04-15-2006 18G /backup/04-24-2006 18G /backup/04-24-2006 18G /backup/04-30-2006 19G /backup/05-09-2006 19G /backup/05-11-2006 18G /backup/04-16-2006 18G /backup/04-18-2006 19G /backup/05-10-2006 18G /backup/05-04-2006 Should each day/directory show around 600mb? I definitely don't think that for 18GB of data, I should have a total of 146GB of storage total. Here are the stats for the last backup. Should the Matched and Literal equal the total file size? Number of files: 50285 Number of files transferred: 113 Total file size: 16191157376 bytes Total transferred file size: 4581163348 bytes Literal data: 673846205 bytes Matched data: 3905515136 bytes File list size: 932272 Total bytes sent: 675118017 Total bytes received: 525414 sent 675118017 bytes received 525414 bytes 591890.87 bytes/sec total size is 16191157376 speedup is 23.96 One thing to note is that the source data is coming from cifs mounted windows shares. The rsync command I'm using for this one is: /usr/bin/rsync /share/ /backup/Current/ --stats --recursive --partial --archive --times --modify-window=1 --delete-after --delete-excluded --ignore-errors --no-whole-file --files-from=/var/www/html/backup/adlist.txt --exclude-from=/scripts/file-exclude --log-format="%f %l %b" And using Rsync 2.6.3 on this one. Any additional information anyone has as to what might be my issue, would be appreciated. Thanks, Max
> You could of course (right after an rsync run) do a > "cd newdir; find . -type f -links 1 -print" and then randomly check a > couple and compare all their attributes such as mtime, permissions to > the previous dir. (I still recommend using the --link-dest thing over > using cp -al first.)Ok, I think I've figured out the problem with this one, although I'm not exactly sure of the reason. I have now started using --link-dest and this works great. Here again is the stat screen: Number of files: 50285 Number of files transferred: 38 Total file size: 16193254538 bytes Total transferred file size: 4077908049 bytes Literal data: 86201342 bytes Matched data: 3989904700 bytes File list size: 945440 File list generation time: 6.615 seconds File list transfer time: 0.000 seconds Total bytes sent: 87436048 Total bytes received: 539014 sent 87436048 bytes received 539014 bytes 97913.26 bytes/sec total size is 16193254538 speedup is 184.07 Well, it ends up that there is a Microsoft backup file (a .bkf file) that is around 4GB in size that is being changed daily. Now my question (I think the final one) is why the entire file seems to be transferred even though rsync obviously detects that only a fraction of the file has changed. Evidently the Literal Data shows 86201342 of changes which appears correct. Also, since I'm using option --log-format="%f %l %b", I see on the file in question, the following results: SERVER/E$/exchange.bkf 4076087296 86454659 Isn't this stating that the file size is 4076087296, and the changes to the file are 86454659? So why is the entire file transferring each day. I'm using the --no-whole-files option. Here is the rsync command options I used for the latest test: rsync /share/ /backup/05-13-2006/ -v --link-dest=/backup/05-12-2006/ --stats --recursive --archive --times --modify-window=1 --delete --ignore-errors --files-from=/var/www/html/backup/adlist.txt --exclude-from=/scripts/file-exclude --no-whole-file --log-format="%f %l %b" 2> errors.log 1> stats.log\ In the previous posts I stated that du showed every incremental directory to be around 4-5gb in size. This is because each day the exchange.bkf has some change associated with it, so I guess the file cannot be linked. So in reality if you have very large files that have very small changes applied, hard-links really serve no purpose, correct? And I assume there is nothing else that can be done with these large files to conserve space? Thanks Max
> > Number of files: 50285 > > Number of files transferred: 38 > > Total file size: 16193254538 bytes > > Total transferred file size: 4077908049 bytes > > Literal data: 86201342 bytes > > Matched data: 3989904700 bytes > > File list size: 945440 > > File list generation time: 6.615 seconds > > File list transfer time: 0.000 seconds > > Total bytes sent: 87436048 > > Total bytes received: 539014 > > > > sent 87436048 bytes received 539014 bytes 97913.26 bytes/sec > > total size is 16193254538 speedup is 184.07 > > > > Well, it ends up that there is a Microsoft backup file (a .bkf file) > > that is around 4GB in size that is being changed daily. > > > > Now my question (I think the final one) is why the entire file seemsto> > be transferred even though rsync obviously detects that only afraction> > of the file has changed. Evidently the Literal Data shows 86201342of> > changes which appears correct. Also, since I'm using option > > --log-format="%f %l %b", I see on the file in question, thefollowing> > results: > > > > SERVER/E$/exchange.bkf 4076087296 86454659 > > > > Isn't this stating that the file size is 4076087296, and the changesto> > the file are 86454659? > > > > So why is the entire file transferring each day. I'm using the > > --no-whole-files option. Here is the rsync command options I usedfor> > the latest test: > > Rsync has NO guarantee that the only changes are to the END. > Rsync has to work when the changes are to the beginning or scattered > throughout. > Rsync goes to a lot of trouble to find and transmit only the changes. > This is extremely useful over slow and/or erratic network connections. > This is probably significantly slower over gigabit ethernet. > > Also, be aware that of the times that are representable in Unix, > DOS and derivatives are only capable of represententing half of them. > Depending on whatever, you may have DOS files that are always seen > as being different because the times do not and cannot match.Rsync seems to be detecting what the changes on this large file. Based on what you are saying, rsync in this case knows what the changes are in the file, roughly 86mb, but cannot transmit only the changes and therefore transmits the entire 4Gb file? If there is no way around this, I guess I'll have to live with it.> > rsync /share/ /backup/05-13-2006/ -v --link-dest=/backup/05-12-2006/ > > --stats --recursive --archive --times --modify-window=1 --delete > > --ignore-errors --files-from=/var/www/html/backup/adlist.txt > > --exclude-from=/scripts/file-exclude --no-whole-file--log-format="%f %l> > %b" 2> errors.log 1> stats.log\ > > > > In the previous posts I stated that du showed every incremental > > directory to be around 4-5gb in size. This is because each day the > > exchange.bkf has some change associated with it, so I guess the file > > cannot be linked. So in reality if you have very large files thathave> > very small changes applied, hard-links really serve no purpose,correct?> > And I assume there is nothing else that can be done with these large > > files to conserve space? > Hard links are how unix names files (the file itself) > Hard links allow one file to have more than one name. > Any change to the file (by any name) is done to the file and shows upin> all > the other names. > When the last name (actually reference) is deleted, the file isdeleted.> There is no "yes, but" associated with hard links. > Hard links will not help save space on similar but not exactly thesame> files.That's what I figured, just wanted to clarify. So if you had a directory with 10 1GB files, and each day you made a 10k change to each, all your incremental directories would have 10GB total, nothing saved from hard-linking. Thanks again. Max
> Recheck the statistics: > 4GB file something like 4,000,000,000 bytes > Total bytes sent: 87,436,048 -- MUCH LESS than 4 GB > Total bytes received: 539,014> > > > Total transferred file size: 4077908049 bytesSorry, got it now. I missed the 'Total bytes sent' stat and was assuming the 'Total transferred file size' meant that is what was transferred. For this test I'm on a 100mb local connection, so it seemed like it was taking long enough for this to be true. Thanks for all the help. Max
Possibly Parallel Threads
- [PATCH 1/2] Modify autoconf tests for intrinsics to stop clang from optimizing them away.
- Data frame vs matrix quirk: Hinky error message?
- Problem with filenames with commas in them
- mgcv package plot superimposing smoothers
- Cannot share folders access denid PDC+LDAP.