Hongyi Zhao
2015-Apr-04 07:21 UTC
Downloading a great number of files from different rsync servers for good loadbalancing and high efficiency.
Hi all, I'm using Debian, I want to make a local repository which can let me install packages more conveniently. Considering that the rsync tool is the Debian official proposed tool for syncing the files among its different rsync server sites, I use the rsync client to downloading the deb packages from the different rsync servers distributed around the world-wide for good loadbalancing and high efficiency. The steps are as follows: 1- Make the packages list file to be downloaded based on the Packages.gz files for the corresponding OS distribution and architecture, say, for testing, i.e., coded name by jessie and the amd64 architecture, the following files can be use for extracting the packages list information: https://mirrors.ustc.edu.cn/debian/dists/jessie/main/binary-amd64/ Packages.gz https://mirrors.ustc.edu.cn/debian/dists/jessie/main/binary-all/ Packages.gz https://mirrors.ustc.edu.cn/debian/dists/jessie/contrib/binary-amd64/ Packages.gz https://mirrors.ustc.edu.cn/debian/dists/jessie/contrib/binary-all/ Packages.gz https://mirrors.ustc.edu.cn/debian/dists/jessie/non-free/binary-amd64/ Packages.gz https://mirrors.ustc.edu.cn/debian/dists/jessie/non-free/binary-all/ Packages.gz After I've downloaded all of the above files, then use the following command for extract the deb packages filenmaes list: find /path/to/Packages.gz -type f -name Packages.gz -exec zcat \{\} + | awk '/^Filename:/{ print $2 } ' > deb-file.list At this point, the deb-file.list will contain a great number of lines like the following: ---------- [snipped] pool/main/m/mockobjects/libmockobjects-java-doc_0.09-5_all.deb pool/main/s/subtitleeditor/subtitleeditor_0.33.0-3_amd64.deb pool/main/h/haskell-hgl/libghc-hgl-prof_3.2.0.5-1_amd64.deb pool/main/l/lsh-utils/lsh-doc_2.1-5_all.deb pool/main/liba/libav/libswscale3_11.3-1_i386.deb pool/main/s/smokeqt/libsmokeqtuitools4-3_4.12.2-2_amd64.deb pool/main/libo/libotf/libotf0-dbg_0.9.13-2_amd64.deb [snipped] ---------- 2- Secondly, I obtain the list for all of the available rsync servers supplied by Debian official and other open-source sites from here: https://www.debian.org/CD/mirroring/rsync-mirrors Note, though the above site say these rsync-mirrors are for Debian CD images, in fact, most of them are also have the non-cd sections of Debian repository. So, I can use them for my purpose without any care. At this stage, I make the rsync-mirrors for my purpose as follows: curl https://www.debian.org/CD/mirroring/rsync-mirrors 2>/dev/null |awk '/::debian-cd\//{gsub(/debian-cd/,"debian",$NF) ; split($NF,a,"<"); print a[1] }' > mirrors.list The content of the mirrors.list looks like the following: ---------------- [snipped] debian.mirror.digitalpacific.com.au::debian-cd/ mirror.as24220.net::debian-cd/ mirror.intrapower.net.au::debian-cd/ mirror.rackcentral.com.au::debian-cd/ debian.anexia.at::debian-cd/ debian.sil.at::debian-cd/ [snipped] ---------------- Currently, I obtain 94 available rsync servers by using the above method which are exactly the content of the file mirrors.list. 3- Finally, I use the powerful rsync tool to downloading all of these deb files listed in deb-file.list by using all of the rsync servers stored in the mirrors.list. Considering that the bandwidth and maxconnections limit imposed by these servers' webmasters -- which are the fact for most of these servers, I want only download one deb file from each of these rsync servers at the same time. And after the downloading finished for the specific rsync server, than let rsync read in the next deb file from the deb-file.list. Again and again, till all of the deb files been downloaded successfully by parallely using all of these rsync servers. For the above purpose, I must use a script to do it. I've tried the following one which I struggling for sometime to get it, but it cann't meet all of the above requirements. In fact it has a great distance from achieving the requirements I posted in the above step 3: ------------------- mirror=1 while read -r -a line do mirror_used=`awk 'NR=='"$mirror"'' mirrors.list` rsync -amH --progress --append-verify --timeout=10 --contimeout=5 \ ${mirror_used} ${line[0]} debs/ & mirror=$[mirror+1] done < deb-file.list wait ------------------- Any hints for this issue? Regards -- Hongyi Zhao <hongyi.zhao at gmail.com> Xinjiang Technical Institute of Physics and Chemistry Chinese Academy of Sciences GnuPG DSA: 0xD108493 -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.samba.org/pipermail/rsync/attachments/20150404/6ea2a383/attachment.html>
Karl O. Pinc
2015-Apr-04 08:20 UTC
Downloading a great number of files from different rsync servers for good loadbalancing and high efficiency.
On Sat, 4 Apr 2015 15:21:21 +0800 Hongyi Zhao <hongyi.zhao at gmail.com> wrote:> I'm using Debian, I want to make a local repository which can let me > install packages more conveniently.Your solution will not work for mirroring debian since it does not do a 2-stage mirroring process. This is described in: https://www.debian.org/mirror/ftpmirrors Further, your solution is a bad idea for many reasons. If you want to know more about this I suggest asking on the Debian mailing lists or on the #debian irc channel on irc.freenode.net. Better would be to use the Debian recommended ftpsync script. This can be found at: https://ftp-master.debian.org/ftpsync.tar.gz The instructions are at: https://www.debian.org/mirror/ftpmirrors The Debian people know how to best mirror Debian. Best to follow their guidance. Depending on your purposes you might not even want a mirror, you might be better served with a cache. Again, ask the Debian people for guidance. Regards, Karl <kop at meme.com> Free Software: "You don't pay back, you pay forward." -- Robert A. Heinlein
Apparently Analagous Threads
- Is it possiable to suppress the site-specified messages?
- Downloading specific files with rsync and make them keeping the original directories structures.
- Rsync a directory via a linked name.
- Rsync a directory via a linked name.
- Skip based on checksum not worked as expected when using with complex filter rules.