Hi I am using rsync for backup on the disks of a Linux backup server. Obviously the server could store more data it the data were compressed. I read the "rsync -> tar" thread. Unfortunately, a compressed file system for Linuthere does not seem to exist yet. However, rsync can use compression for file transfer. Would it be possible to implement an option to store the data on the backup server in the compressed chunks use for the file transfer? It would save space and should also speedup a later rsynchronization. Matt -- -------------------------------- Matthias Munnich Univ. of California, Los Angeles Inst. of Geophysics and Planetary Physics
No! Only the sender side has to compress the data. The comparison could be done in the compressed data format. With the -z option the sender compresses the data anyway. The checksum test should be faster for the smaller compressed pieces. Matt diburim wrote:> > I guess it is not so simple. Because next time you run rsync, > each file will have to be decompress for comparison. > > Dib > > ----- Original Message ----- > From: "Matthias Munnich" <munnich@atmos.ucla.edu> > To: <rsync@lists.samba.org> > Sent: Wednesday, May 22, 2002 9:47 PM > Subject: Compressed backup > > > Hi > > > > I am using rsync for backup on the disks of a Linux backup server. > > Obviously the server could store more data if the data were > > compressed. I read the "rsync -> tar" thread. Unfortunately, a > > compressed file system for Linux does not seem to exist yet. > > However, rsync can use compression for file transfer. > > > > Would it be possible to implement an option to store the data on the > > backup server in the compressed chunks use for the file transfer? > > It would save space and should also speedup any later rsynchronization. > > > > Matt > > -- > > -------------------------------- > > Matthias Munnich > > Univ. of California, Los Angeles > > Inst. of Geophysics and Planetary Physics > > > > -- > > To unsubscribe or change options: > http://lists.samba.org/mailman/listinfo/rsync > > Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html > >-- -------------------------------- Matthias Munnich Univ. of California, Los Angeles
Matthias Munnich [munnich@atmos.ucla.edu] writes:> No! Only the sender side has to compress the data. The comparison > could be done in the compressed data format. With the -z option > the sender compresses the data anyway. The checksum test should > be faster for the smaller compressed pieces.Except that you'll probably end up retransmitting the whole thing due to the change in compressed output. Since a compression function is essentially a data randomizer (the better the compression the better the randomization of the output), tiny changes in input can result in huge changes in output. That's the traditional problem of trying to use an algorithm like rsync's with compressed file formats. You really need to apply the rsync algorithm to the uncompressed files if you hope to gain any real efficiencies in terms of reduction of traffic transmitted. -- David /-----------------------------------------------------------------------\ \ David Bolen \ E-mail: db3l@fitlinxx.com / | FitLinxx, Inc. \ Phone: (203) 708-5192 | / 860 Canal Street, Stamford, CT 06902 \ Fax: (203) 316-5150 \ \-----------------------------------------------------------------------/
On Wed, May 22, 2002 at 12:47:00PM -0700, Matthias Munnich wrote:> Hi > > I am using rsync for backup on the disks of a Linux backup server. > Obviously the server could store more data it the data were > compressed. I read the "rsync -> tar" thread. Unfortunately, a > compressed file system for Linuthere does not seem to exist yet. > However, rsync can use compression for file transfer. > > Would it be possible to implement an option to store the data on the > backup server in the compressed chunks use for the file transfer? > It would save space and should also speedup a later rsynchronization.Despite what it's homepage says, from what i gather e2compr is in active development and a fair number of people are using it. Try searching thre recent (last 8 weeks) lkml archives to find out where it has moved. If you are attempting compression in order to store multiple backup "snapshots" take a look at linking them. I do so using an rsync patch (--link-dest) and on a fairly active area get the same effect as 7:1 compression but without actually compressing anything. Less active trees will require even less space. The patch, as well as the Dirvish backup system can be had at http:www.pegasys.ws/dirvish/ -- ________________________________________________________________ J.W. Schultz Pegasystems Technologies email address: jw@pegasys.ws Remember Cernan and Schmitt
I've got a suggestion regarding the mail Kevin wrote: Instead of comparing the least m bits of n bytes I'd suggest using a algorithm as described in the Paper http://webglimpse.org/publications.html "Siff -- Finding Similar Files in a Large File System" ftp://ftp.cs.arizona.edu/reports/1993/TR93-33.ps (Search for "Manber finding" in google for more references) This calculates a rolling checksum for n bytes (n can be chosen). In this paper a action is taken if m bits of this checksum are a defined value, similar to if (!(checksum & 0xff)) { ... } for 8 bits = 0 which gives a 1/256 chance. I wrote the program described there (siff) and it works very well. Regards, Phil