> On Wed, 2020-03-25 at 14:39 +0000, Leroy Tennison wrote: >> Since you state that using -z is almost always a bad idea, could you >> provide the rationale for that? I must be missing something. >> > I think the "rationale" is that at some point the > compression/decompression takes longer than the time reduction from > sending a compressed file. It depends on the relative speeds of the > machines and the network. > > You have most to gain from compressing large files, but if they are > already compressed, then you have nothing to gain from just doing small > files. > > It obviously depends on your network speed and if you have a metered > connection, but does anyone really have such an ancient network > connection still these days - I mean if you have fast enough machines > at both ends to do rapid compression/decompression, it seems unlikely > that you will have a damp piece of string connecting them.I really don't understand the discussion here. What is wrong with using -z with rsync? We're using rsync with -z for backups and just don't want to waste bandwidth for nothing. We have better use for our bandwidth and it makes quite a difference when backing up terabytes of data. The only reason why I asked for help is because we don't want to double compress data which is already compressed. This is what currently is broken in rsync without manually specifying a skip-compress list. Fixing it would help all those who don't know it's broken now. Thanks, Simon
On Wed, 2020-03-25 at 19:15 +0100, Simon Matter via CentOS wrote:> > On Wed, 2020-03-25 at 14:39 +0000, Leroy Tennison wrote: > > > Since you state that using -z is almost always a bad idea, could you > > > provide the rationale for that? I must be missing something. > > > > > I think the "rationale" is that at some point the > > compression/decompression takes longer than the time reduction from > > sending a compressed file. It depends on the relative speeds of the > > machines and the network. > > > > You have most to gain from compressing large files, but if they are > > already compressed, then you have nothing to gain from just doing small > > files. > > > > It obviously depends on your network speed and if you have a metered > > connection, but does anyone really have such an ancient network > > connection still these days - I mean if you have fast enough machines > > at both ends to do rapid compression/decompression, it seems unlikely > > that you will have a damp piece of string connecting them. > > I really don't understand the discussion here. What is wrong with using -z > with rsync? We're using rsync with -z for backups and just don't want to > waste bandwidth for nothing. We have better use for our bandwidth and it > makes quite a difference when backing up terabytes of data.I don't really care if you use -z, but you asked for the rationale, and I gave you it. I'm not telling you what you should do. I'll try and make it simpler - if rsync takes 1 second to compress the file, then 1 second to decompress the file, and the whole transfer of the file takes 11 seconds uncompressed vs 10 seconds compressed, then dealing with file takes overall 12 seconds compressed, vs 11 seconds uncompressed. It's not worth it. But as I said it depends on your network and your machine speeds. It's up to you to decide what is best in your own situation. P.
That's why I asked, I wanted to know if there was something inherently bad with "-z". I had a situation where Postgresql was replicating 16M files every few minutes ("log shipping") on approximately 10 systems, got behind which resulted in almost continuous file transfer (of mostly null 16M files) and saturated the common link. Specifying compression with file transfer cut transmission time by 5-10x resolving the problem. ________________________________ From: CentOS <centos-bounces at centos.org> on behalf of Simon Matter via CentOS <centos at centos.org> Sent: Wednesday, March 25, 2020 1:15 PM To: CentOS mailing list <centos at centos.org> Subject: [EXTERNAL] Re: [CentOS] Need help to fix bug in rsync &g Harriscomputer Leroy Tennison Network Information/Cyber Security Specialist E: leroy at datavoiceint.com [cid:Data-Voice-International-LOGO_aa3d1c6e-5cfb-451f-ba2c-af8059e69609.PNG] 2220 Bush Dr McKinney, Texas 75070 www.datavoiceint.com<http://www..com> This message has been sent on behalf of a company that is part of the Harris Operating Group of Constellation Software Inc. If you prefer not to be contacted by Harris Operating Group please notify us<http://subscribe.harriscomputer.com/>. This message is intended exclusively for the individual or entity to which it is addressed. This communication may contain information that is proprietary, privileged or confidential or otherwise legally exempt from disclosure. If you are not the named addressee, you are not authorized to read, print, retain, copy or disseminate this message or any part of it. If you have received this message in error, please notify the sender immediately by e-mail and delete all copies of the message. t; On Wed, 2020-03-25 at 14:39 +0000, Leroy Tennison wrote:>> Since you state that using -z is almost always a bad idea, could you >> provide the rationale for that? I must be missing something. >> > I think the "rationale" is that at some point the > compression/decompression takes longer than the time reduction from > sending a compressed file. It depends on the relative speeds of the > machines and the network. > > You have most to gain from compressing large files, but if they are > already compressed, then you have nothing to gain from just doing small > files. > > It obviously depends on your network speed and if you have a metered > connection, but does anyone really have such an ancient network > connection still these days - I mean if you have fast enough machines > at both ends to do rapid compression/decompression, it seems unlikely > that you will have a damp piece of string connecting them.I really don't understand the discussion here. What is wrong with using -z with rsync? We're using rsync with -z for backups and just don't want to waste bandwidth for nothing. We have better use for our bandwidth and it makes quite a difference when backing up terabytes of data. The only reason why I asked for help is because we don't want to double compress data which is already compressed. This is what currently is broken in rsync without manually specifying a skip-compress list. Fixing it would help all those who don't know it's broken now. Thanks, Simon _______________________________________________ CentOS mailing list CentOS at centos.org https://lists.centos.org/mailman/listinfo/centos
I appreciate the reply - it keeps me from wondering "is there something I should be concerned about?". We use a co-location facility where we pay for bandwidth utilization so it's still an issue. ________________________________ From: CentOS <centos-bounces at centos.org> on behalf of Pete Biggs <pete at biggs.org.uk> Sent: Wednesday, March 25, 2020 1:32 PM To: centos at centos.org <centos at centos.org> Subject: [EXTERNAL] Re: [CentOS] Need help to fix bug in rsync Harriscomputer Leroy Tennison Network Information/Cyber Security Specialist E: leroy at datavoiceint.com [cid:Data-Voice-International-LOGO_aa3d1c6e-5cfb-451f-ba2c-af8059e69609.PNG] 2220 Bush Dr McKinney, Texas 75070 www.datavoiceint.com<http://www..com> This message has been sent on behalf of a company that is part of the Harris Operating Group of Constellation Software Inc. If you prefer not to be contacted by Harris Operating Group please notify us<http://subscribe.harriscomputer.com/>. This message is intended exclusively for the individual or entity to which it is addressed. This communication may contain information that is proprietary, privileged or confidential or otherwise legally exempt from disclosure. If you are not the named addressee, you are not authorized to read, print, retain, copy or disseminate this message or any part of it. If you have received this message in error, please notify the sender immediately by e-mail and delete all copies of the message. On Wed, 2020-03-25 at 19:15 +0100, Simon Matter via CentOS wrote:> > On Wed, 2020-03-25 at 14:39 +0000, Leroy Tennison wrote: > > > Since you state that using -z is almost always a bad idea, could you > > > provide the rationale for that? I must be missing something. > > > > > I think the "rationale" is that at some point the > > compression/decompression takes longer than the time reduction from > > sending a compressed file. It depends on the relative speeds of the > > machines and the network. > > > > You have most to gain from compressing large files, but if they are > > already compressed, then you have nothing to gain from just doing small > > files. > > > > It obviously depends on your network speed and if you have a metered > > connection, but does anyone really have such an ancient network > > connection still these days - I mean if you have fast enough machines > > at both ends to do rapid compression/decompression, it seems unlikely > > that you will have a damp piece of string connecting them. > > I really don't understand the discussion here. What is wrong with using -z > with rsync? We're using rsync with -z for backups and just don't want to > waste bandwidth for nothing. We have better use for our bandwidth and it > makes quite a difference when backing up terabytes of data.I don't really care if you use -z, but you asked for the rationale, and I gave you it. I'm not telling you what you should do. I'll try and make it simpler - if rsync takes 1 second to compress the file, then 1 second to decompress the file, and the whole transfer of the file takes 11 seconds uncompressed vs 10 seconds compressed, then dealing with file takes overall 12 seconds compressed, vs 11 seconds uncompressed. It's not worth it. But as I said it depends on your network and your machine speeds. It's up to you to decide what is best in your own situation. P. _______________________________________________ CentOS mailing list CentOS at centos.org https://lists.centos.org/mailman/listinfo/centos
Am 25.03.20 um 19:15 schrieb Simon Matter via CentOS:>> On Wed, 2020-03-25 at 14:39 +0000, Leroy Tennison wrote: >>> Since you state that using -z is almost always a bad idea, could you >>> provide the rationale for that? I must be missing something. >>> >> I think the "rationale" is that at some point the >> compression/decompression takes longer than the time reduction from >> sending a compressed file. It depends on the relative speeds of the >> machines and the network. >> >> You have most to gain from compressing large files, but if they are >> already compressed, then you have nothing to gain from just doing small >> files. >> >> It obviously depends on your network speed and if you have a metered >> connection, but does anyone really have such an ancient network >> connection still these days - I mean if you have fast enough machines >> at both ends to do rapid compression/decompression, it seems unlikely >> that you will have a damp piece of string connecting them. > > I really don't understand the discussion here. What is wrong with using -z > with rsync? We're using rsync with -z for backups and just don't want to > waste bandwidth for nothing. We have better use for our bandwidth and it > makes quite a difference when backing up terabytes of data. > > The only reason why I asked for help is because we don't want to double > compress data which is already compressed. This is what currently is > broken in rsync without manually specifying a skip-compress list. Fixing > it would help all those who don't know it's broken now. >Until this is fixed; as a workaround I would do a two-pass transfer with filters via ".rsync-filter" file and then using rsync -azvF for everything with high compression ratio and rsync -av for all, including compressed data. So, ".rsync-filter" includes the exclude statements for compressed formats. This all makes only sense if the compression ratio is higher then the meta data transfer of the second run ... -- Leon