Hongyi Zhao
2015-Apr-04 07:21 UTC
Downloading a great number of files from different rsync servers for good loadbalancing and high efficiency.
Hi all,
I'm using Debian, I want to make a local repository which can let me
install packages more conveniently.
Considering that the rsync tool is the Debian official proposed tool for
syncing the files among its different rsync server sites, I use the rsync
client to downloading the deb packages from the different rsync servers
distributed around the world-wide for good loadbalancing and high
efficiency.
The steps are as follows:
1- Make the packages list file to be downloaded based on the Packages.gz
files for the corresponding OS distribution and architecture, say, for
testing, i.e., coded name by jessie and the amd64 architecture, the
following files can be use for extracting the packages list information:
https://mirrors.ustc.edu.cn/debian/dists/jessie/main/binary-amd64/
Packages.gz
https://mirrors.ustc.edu.cn/debian/dists/jessie/main/binary-all/
Packages.gz
https://mirrors.ustc.edu.cn/debian/dists/jessie/contrib/binary-amd64/
Packages.gz
https://mirrors.ustc.edu.cn/debian/dists/jessie/contrib/binary-all/
Packages.gz
https://mirrors.ustc.edu.cn/debian/dists/jessie/non-free/binary-amd64/
Packages.gz
https://mirrors.ustc.edu.cn/debian/dists/jessie/non-free/binary-all/
Packages.gz
After I've downloaded all of the above files, then use the following
command for extract the deb packages filenmaes list:
find /path/to/Packages.gz -type f -name Packages.gz -exec zcat \{\} + |
awk '/^Filename:/{ print $2 } ' > deb-file.list
At this point, the deb-file.list will contain a great number of lines
like the following:
----------
[snipped]
pool/main/m/mockobjects/libmockobjects-java-doc_0.09-5_all.deb
pool/main/s/subtitleeditor/subtitleeditor_0.33.0-3_amd64.deb
pool/main/h/haskell-hgl/libghc-hgl-prof_3.2.0.5-1_amd64.deb
pool/main/l/lsh-utils/lsh-doc_2.1-5_all.deb
pool/main/liba/libav/libswscale3_11.3-1_i386.deb
pool/main/s/smokeqt/libsmokeqtuitools4-3_4.12.2-2_amd64.deb
pool/main/libo/libotf/libotf0-dbg_0.9.13-2_amd64.deb
[snipped]
----------
2- Secondly, I obtain the list for all of the available rsync servers
supplied by Debian official and other open-source sites from here:
https://www.debian.org/CD/mirroring/rsync-mirrors
Note, though the above site say these rsync-mirrors are for Debian CD
images, in fact, most of them are also have the non-cd sections of Debian
repository. So, I can use them for my purpose without any care.
At this stage, I make the rsync-mirrors for my purpose as follows:
curl https://www.debian.org/CD/mirroring/rsync-mirrors 2>/dev/null |awk
'/::debian-cd\//{gsub(/debian-cd/,"debian",$NF) ;
split($NF,a,"<"); print
a[1] }' > mirrors.list
The content of the mirrors.list looks like the following:
----------------
[snipped]
debian.mirror.digitalpacific.com.au::debian-cd/
mirror.as24220.net::debian-cd/
mirror.intrapower.net.au::debian-cd/
mirror.rackcentral.com.au::debian-cd/
debian.anexia.at::debian-cd/
debian.sil.at::debian-cd/
[snipped]
----------------
Currently, I obtain 94 available rsync servers by using the above method
which are exactly the content of the file mirrors.list.
3- Finally, I use the powerful rsync tool to downloading all of these deb
files listed in deb-file.list by using all of the rsync servers stored in
the mirrors.list. Considering that the bandwidth and maxconnections
limit imposed by these servers' webmasters -- which are the fact for
most of these servers, I want only download one deb file from each of
these rsync servers at the same time. And after the downloading finished
for the specific rsync server, than let rsync read in the next deb file
from the deb-file.list. Again and again, till all of the deb files been
downloaded successfully by parallely using all of these rsync servers.
For the above purpose, I must use a script to do it. I've tried the
following one which I struggling for sometime to get it, but it cann't meet
all of the above requirements. In fact it has a great distance from
achieving the requirements I
posted in the above step 3:
-------------------
mirror=1
while read -r -a line
do
mirror_used=`awk 'NR=='"$mirror"'' mirrors.list`
rsync -amH --progress --append-verify --timeout=10 --contimeout=5 \
${mirror_used} ${line[0]} debs/ &
mirror=$[mirror+1]
done < deb-file.list
wait
-------------------
Any hints for this issue?
Regards
--
Hongyi Zhao <hongyi.zhao at gmail.com>
Xinjiang Technical Institute of Physics and Chemistry
Chinese Academy of Sciences
GnuPG DSA: 0xD108493
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.samba.org/pipermail/rsync/attachments/20150404/6ea2a383/attachment.html>
Karl O. Pinc
2015-Apr-04 08:20 UTC
Downloading a great number of files from different rsync servers for good loadbalancing and high efficiency.
On Sat, 4 Apr 2015 15:21:21 +0800 Hongyi Zhao <hongyi.zhao at gmail.com> wrote:> I'm using Debian, I want to make a local repository which can let me > install packages more conveniently.Your solution will not work for mirroring debian since it does not do a 2-stage mirroring process. This is described in: https://www.debian.org/mirror/ftpmirrors Further, your solution is a bad idea for many reasons. If you want to know more about this I suggest asking on the Debian mailing lists or on the #debian irc channel on irc.freenode.net. Better would be to use the Debian recommended ftpsync script. This can be found at: https://ftp-master.debian.org/ftpsync.tar.gz The instructions are at: https://www.debian.org/mirror/ftpmirrors The Debian people know how to best mirror Debian. Best to follow their guidance. Depending on your purposes you might not even want a mirror, you might be better served with a cache. Again, ask the Debian people for guidance. Regards, Karl <kop at meme.com> Free Software: "You don't pay back, you pay forward." -- Robert A. Heinlein
Seemingly Similar Threads
- Is it possiable to suppress the site-specified messages?
- Downloading specific files with rsync and make them keeping the original directories structures.
- Rsync a directory via a linked name.
- Rsync a directory via a linked name.
- Skip based on checksum not worked as expected when using with complex filter rules.