Krzysztof Nosek
2009-Jul-06 16:45 UTC
How to make big MySQL database more diffable/rsyncable? (aka rsyncing big files)
Hello group, I'm having a very hard time rsyncing efficiently a MySQL database which contains very large binary blobs. (Actually, it's the database of Mantis bug tracker [http://www.mantisbt.org/], with file attachments stored directly in the table rows. I know it's a bad idea from many other reasons, but let's say it was given to me as such.) First, I was dumping the database with mysqldump to an uncompressed dump and rsyncing it. Such a dump would use some 34G of disk space. My rsync took no less than two hours then, regardless of number of actual changes in the database. (Later I realized that my dump was perhaps not enough line-oriented, so I tried some --skip-extended-insert options of mysqldump, with no luck.) Currently I'm using mysqlhotcopy to rsync raw database data. It still takes no less than one hour (which is 50% of the previous results, mostly because the raw data of the database is just 17G. The single biggest file is 16909M). How to do it better? I thought that rsync would be magically able to extract changes from my data. However, it looks like it diffs the 17G file everyday again and again, even it stayed intact. The times/attributes of files do not change across the copies either, as mysqlhotcopy preserves them. Please help! Sorry if my problem is lame, I'm relatively new to the field. Regards, Krzysztof Nosek
malayter@gmail.com
2009-Jul-06 17:26 UTC
How to make big MySQL database more diffable/rsyncable? (aka rsyncing big files)
On Jul 6, 2009 4:41am, Krzysztof Nosek <krzysztof.nosek@techland.pl> wrote:> I'm having a very hard time rsyncing efficiently a MySQL database which > contains very large binary blobs.<snip>> How to do it better? I thought that rsync would be magically able to > extract changes from my data. However, it looks like it diffs the 17G > file everyday again and again, even it stayed intact. The > times/attributes of files do not change across the copies either, as > mysqlhotcopy preserves them.There was a recent thread on this list regarding rsync of MS SQL Server database backups. Most of those suggestions would apply to the mysql case as well. You would want to make sure you are telling rsync to use a block size that is the same as or an integral multiple of the mysql block size. You will also want to minimize unnecessary change in the source database (index rebuilds are a major culprit). -- RPM -------------- next part -------------- HTML attachment scrubbed and removed
Martin Schwenke
2009-Jul-07 21:16 UTC
How to make big MySQL database more diffable/rsyncable? (aka rsyncing big files)
>>>>> "Krzysztof" == Krzysztof Nosek <krzysztof.nosek@techland.pl> writes:Krzysztof> First, I was dumping the database with mysqldump to an Krzysztof> uncompressed dump and rsyncing it. Such a dump would Krzysztof> use some 34G of disk space. [...] Some simple suggestions in addition to the one about experimenting with block size: * Does the output of mysqldump compress well with gzip? If so, you could try compressing it with a version of gzip that supports the --rsyncable option. The version in Debian or Ubuntu Linux supports this option. * If you're confident that you have a good line-oriented dump, how does diff cope when comparing 2 dumps? rsync and diff obviously use very different schemes for figuring out what is different. However, sometimes doing a diff will show you something obvious that you have missed. Obviously you'll need 68GB of free disk to be able to try that! * If you're not telling rsync to transfer just a single file, do the filenames match at both ends of the transfer? You not doing something like creating a dump with the date in the filename, are you? Sorry, no rocket science there... ;-) peace & happiness, martin
Krzysztof Nosek
2009-Jul-10 12:48 UTC
How to make big MySQL database more diffable/rsyncable? (aka rsyncing big files)
Hello,> So I do not think the basic data structure is the problem, > unless mysql hotcopy does something really strange like inserting a > timestamp or other changing data info every few KB in the output stream. >No, really, mysqlhotcopy performs just a raw file system copy of /var/lib/mysql taken from the locked database. If nothing particular happens meanwhile in the running database, the copy is 1:1 with the original. Easy to check with any smaller database.> I would suggest trying a tool like xdelta (on the same machine) against two > consecutive backup files, just to see if it can extract similarities. If > xdelta can find significant matched data, rsync should be able to as well. >I'd love to do that but I can't make it actually working: xdelta: open ../mantis_game_20090707/mantis_bug_file_table.MYD failed: Value too large for defined data type Same for dumps. I think it's running out of memory just like diff does with files that large, isn't it?> Also, is the transfer CPU bound or network bound? Can you send the output > of rsync with the --stats and -v options? >I'm pretty sure it's the network. The rsync jobs on both machines use no more than 30-50% of the CPU. I may be wrong - please find the log attached. Perhaps I am memory bound, could it be? Regards, nosek -------------- next part -------------- $ rsync -kavzO --del --stats --block-size=32768 --delete-excluded \ --exclude=phpThumb/cache \ --exclude=glpi.techland.pl/files/_cache \ --exclude=glpi.techland.pl/files/_sessions \ --exclude=moinmoin/data/cache \ --exclude=tmp/cgisess \ --exclude=*.old --exclude=*_old \ knosek@mantis.techland.pl:/ftp2/backup/mysql-mirror /home/nosek/rsync-test2/ftp2 2>&1 \ | tee -a /home/nosek/rsync-test2/ftp2/mysql-mirror.log receiving incremental file list # ...27 entries of some smaller databases here... mysql-mirror/mantis_game/mantis_bug_file_table.MYD mysql-mirror/mantis_game/mantis_bug_file_table.MYI mysql-mirror/mantis_game/mantis_bug_history_table.MYD mysql-mirror/mantis_game/mantis_bug_history_table.MYI mysql-mirror/mantis_game/mantis_bug_monitor_table.MYD mysql-mirror/mantis_game/mantis_bug_monitor_table.MYI mysql-mirror/mantis_game/mantis_bug_relationship_table.MYD mysql-mirror/mantis_game/mantis_bug_relationship_table.MYI mysql-mirror/mantis_game/mantis_bug_table.MYD mysql-mirror/mantis_game/mantis_bug_table.MYI mysql-mirror/mantis_game/mantis_bug_text_table.MYD mysql-mirror/mantis_game/mantis_bug_text_table.MYI mysql-mirror/mantis_game/mantis_bugnote_table.MYD mysql-mirror/mantis_game/mantis_bugnote_table.MYI mysql-mirror/mantis_game/mantis_bugnote_text_table.MYD mysql-mirror/mantis_game/mantis_bugnote_text_table.MYI mysql-mirror/mantis_game/mantis_custom_field_string_table.MYD mysql-mirror/mantis_game/mantis_custom_field_string_table.MYI mysql-mirror/mantis_game/mantis_email_table.MYD mysql-mirror/mantis_game/mantis_email_table.MYI mysql-mirror/mantis_game/mantis_filters_table.MYD mysql-mirror/mantis_game/mantis_filters_table.MYI mysql-mirror/mantis_game/mantis_project_version_table.MYD mysql-mirror/mantis_game/mantis_project_version_table.MYI mysql-mirror/mantis_game/mantis_tokens_table.MYD mysql-mirror/mantis_game/mantis_tokens_table.MYI mysql-mirror/mantis_game/mantis_user_table.MYD mysql-mirror/mantis_game/mantis_user_table.MYI # ... 32 entries of some smaller databases here ... Number of files: 1617 Number of files transferred: 87 Total file size: 18029881002 bytes Total transferred file size: 18013399877 bytes Literal data: 123736377 bytes Matched data: 17889663500 bytes File list size: 43382 File list generation time: 0.020 seconds File list transfer time: 0.000 seconds Total bytes sent: 4361279 Total bytes received: 103628292 sent 4361279 bytes received 103628292 bytes 31406.01 bytes/sec total size is 18029881002 speedup is 166.96