Hi folks, I've just finished rsyncing/downloading/jigdoizing the entire i386/x86_64 CentOS 4.2 distribution. If anyone is interested go to http://mirror.tcs.ii.uj.edu.pl/jigdo/ You'll need to edit the .jigdo file by hand to change the server section [Servers] CentOS42=file:/opt/mirrors/centos/4.2/ to point to a local mirror (file, http or ftp), ie. to use kernel.org: [Servers] CentOS42=http://mirrors.kernel.org/centos/4.2/ While doing this I have come upon a few questions: a) it seems the server cd's have a lot of stuff not present in the normal directory mirror, I guess this is an artifact of the build process? [the template files for the servercd's are ~120MB] b) what are the .newheaders and .repodata directories on i386 CD1? c) why do the mirror repodata/*.xml.gz files not match neither the CD nor DVD versions for i386? d) why does the i386 DVD not match ideally, but the x86_64 DVD matches for _all_ files. The x86_64 CD1 also matches _much_ better than the i386 CD1... e) why aren't identical files between the two trees hardlinked? $ ls -ali os/*/CentOS/RPMS/yum*noarch* 278532 -rw-rw-r-- 1 maze maze 395922 Sep 4 19:48 i386/CentOS/RPMS/yum-2.4.0-1.centos4.noarch.rpm 1165388 -rw-rw-r-- 1 maze maze 395922 Oct 10 22:20 x86_64/CentOS/RPMS/yum-2.4.0-1.centos4.noarch.rpm $ md5sum os/*/CentOS/RPMS/yum*noarch* 371d55a19f8e4ca13d22974128ab4671 i386/CentOS/RPMS/yum-2.4.0-1.centos4.noarch.rpm 371d55a19f8e4ca13d22974128ab4671 x86_64/CentOS/RPMS/yum-2.4.0-1.centos4.noarch.rpm Just an example of two identical files from my mirror, one of which is wasting space even though contents are identical. I expect we have this situation for almost _all_ i386 packages from the x86_64 distribution... $ pwd /opt/mirrors/centos/4.2/os/x86_64 $ find|grep "i386\.rpm"|while read i;do diff -qr "$i" "../i386/$i";done $ find|grep "i386\.rpm"|while read i;do cat "$i";done|wc -c 440745010 $ find|grep "noarch\.rpm"|while read i;do diff -qr "$i" "../i386/$i";done $ find|grep "noarch\.rpm"|while read i;do cat "$i";done|wc -c 426816227 $ pwd /opt/mirrors/centos/4.2/updates/x86_64 $ find|grep "i386\.rpm"|while read i;do diff -qr "$i" "../i386/$i";done $ find|grep "i386\.rpm"|while read i;do cat "$i";done|wc -c 12819616 $ find|grep "noarch\.rpm"|while read i;do diff -qr "$i" "../i386/$i";done diff: ../i386/./RPMS/createrepo-0.4.3-1.noarch.rpm: No such file or directory $ find|grep "noarch\.rpm"|while read i;do cat "$i";done|wc -c 2164495 $ ls RPMS/createrepo-0.4.3-1.noarch.rpm -al -rw-rw-r-- 2 maze maze 18284 Sep 5 13:59 RPMS/createrepo-0.4.3-1.noarch.rpm That seems to me to be a 880 MB mirror space savings to be made there... Considering the i386/x86_64 mirror takes up 7.7GB (without iso's) that's quite a bit... I also imagine the noarch files are shared with most of the other architectures... so I'd assume another 400MB per every next arch can be saved... f) Why aren't jigdo files available on the site? They'd really come in useful, especially in the situation I had where I already had a complete mirror of all the files, but I still had to bittorrent the CD/DVD's even though I had 99% of the required data on disk! Cheers, MaZe. -------------- next part -------------- -----=====----- CentOS 4.2 i386 CD1 -----=====----- /CentOS/base/hdlist /CentOS/base/hdlist2 /CentOS/base/hdstg2.img /.discinfo /headers/header.info /isolinux/boot.cat Only in 1: .newheaders /repodata/filelists.xml.gz /repodata/other.xml.gz /repodata/primary.xml.gz /repodata/repomd.xml Only in 1: .repodata -----=====----- CentOS 4.2 i386 CD2 -----=====----- /.discinfo -----=====----- CentOS 4.2 i386 CD3 -----=====----- /.discinfo -----=====----- CentOS 4.2 i386 CD4 -----=====----- /.discinfo -----=====----- CentOS 4.2 i386 DVD -----=====----- /headers/header.info /isolinux/boot.cat /isolinux/isolinux.bin /repodata/filelists.xml.gz /repodata/other.xml.gz /repodata/primary.xml.gz /repodata/repomd.xml -----=====----- CentOS 4.2 i386 ServerCD -----=====----- /CentOS/base/comps.rpm /CentOS/base/comps.xml /CentOS/base/hdlist /CentOS/base/hdlist2 /CentOS/base/hdstg2.img /CentOS/base/netstg2.img Only in Server/CentOS/base: product.img /CentOS/base/stage2.img Only in Server/CentOS/RPMS: anaconda-product-4.0-2.centos4.1.noarch.rpm Only in Server/CentOS/RPMS: comps-4.2CENTOS-1.20051106.i386.rpm Only in Server/CentOS/RPMS: rpmdb-CentOS-4.2-0.20051106.i386.rpm /.discinfo /images/boot.iso /images/diskboot.img /images/pxeboot/initrd.img /images/pxeboot/README /images/pxeboot/vmlinuz /images/README /isolinux/boot.cat /isolinux/initrd.img /isolinux/isolinux.bin /isolinux/vmlinuz /RELEASE-NOTES-en.html Only in Server: RPM-GPG-KEY-CentOS-4 Only in Server: SRPMS -------------- next part -------------- -----=====----- CentOS 4.2 x86_64 CD1 -----=====----- /CentOS/base/hdlist /CentOS/base/hdlist2 /.discinfo /isolinux/boot.cat /isolinux/isolinux.bin -----=====----- CentOS 4.2 x86_64 CD2 -----=====----- /.discinfo -----=====----- CentOS 4.2 x86_64 CD3 -----=====----- /.discinfo -----=====----- CentOS 4.2 x86_64 CD4 -----=====----- /.discinfo -----=====----- CentOS 4.2 x86_64 DVD -----=====----- -----=====----- CentOS 4.2 x86_64 ServerCD -----=====----- /CentOS/base/comps.rpm /CentOS/base/comps.xml /CentOS/base/hdlist /CentOS/base/hdlist2 /CentOS/base/hdstg2.img /CentOS/base/netstg2.img Only in Server/CentOS/base: product.img /CentOS/base/stage2.img Only in Server/CentOS/RPMS: comps-4.2CENTOS-0.20051123.x86_64.rpm Only in Server/CentOS/RPMS: rpmdb-CentOS-4.2-0.20051123.x86_64.rpm /.discinfo /images/boot.iso /images/diskboot.img /images/pxeboot/initrd.img /images/pxeboot/README /isolinux/boot.cat /isolinux/initrd.img /isolinux/isolinux.bin Only in Server: SRPMS
Hmm, no comments on this thread? Cheers, MaZe.
On Fri, 2005-12-30 at 00:00 +0100, Maciej ?enczykowski wrote:> Hi folks, > > I've just finished rsyncing/downloading/jigdoizing the entire i386/x86_64 > CentOS 4.2 distribution. > > If anyone is interested go to > > http://mirror.tcs.ii.uj.edu.pl/jigdo/ > > You'll need to edit the .jigdo file by hand to change the server section > > [Servers] > CentOS42=file:/opt/mirrors/centos/4.2/ > > to point to a local mirror (file, http or ftp), ie. to use kernel.org: > > [Servers] > CentOS42=http://mirrors.kernel.org/centos/4.2/ > > While doing this I have come upon a few questions: > > a) it seems the server cd's have a lot of stuff not present in the normal > directory mirror, I guess this is an artifact of the build process? > [the template files for the servercd's are ~120MB] > > b) what are the .newheaders and .repodata directories on i386 CD1? > > c) why do the mirror repodata/*.xml.gz files not match neither the CD nor > DVD versions for i386?There was an issue after tree dissemination that required yum-arch and createrepo to be run again on the main tree. This may happen from time to time due to mirror rsync issues.> > d) why does the i386 DVD not match ideally, but the x86_64 DVD matches > for _all_ files. The x86_64 CD1 also matches _much_ better than the i386 > CD1...There was a need to rerun the yum-arch and createrepo on the tree after the ISOs were released ... that may or may not be the cause of the differences. However, from a yum and up2date prespective, the i386 tree, DVD, and CD set are the same. Did I mention that we don't have 5 million dollars or 500 programmers to produce centos. All the trees and mirrors are donated ... and all the developers donate their time and machines to make this happen. I do the best job I can to make this a good and FREE distro, as do all the other devels.> > e) why aren't identical files between the two trees hardlinked? > > $ ls -ali os/*/CentOS/RPMS/yum*noarch* > 278532 -rw-rw-r-- 1 maze maze 395922 Sep 4 19:48 i386/CentOS/RPMS/yum-2.4.0-1.centos4.noarch.rpm > 1165388 -rw-rw-r-- 1 maze maze 395922 Oct 10 22:20 x86_64/CentOS/RPMS/yum-2.4.0-1.centos4.noarch.rpm > > $ md5sum os/*/CentOS/RPMS/yum*noarch* > 371d55a19f8e4ca13d22974128ab4671 i386/CentOS/RPMS/yum-2.4.0-1.centos4.noarch.rpm > 371d55a19f8e4ca13d22974128ab4671 x86_64/CentOS/RPMS/yum-2.4.0-1.centos4.noarch.rpm > > Just an example of two identical files from my mirror, one of which is > wasting space even though contents are identical. I expect we have this > situation for almost _all_ i386 packages from the x86_64 distribution... >We run a program called hardlink++ on the master mirror that should hard link files that are identical. If it is not hardlinking those it should. Are you using -H option on your rsyncing down?> $ pwd > /opt/mirrors/centos/4.2/os/x86_64 > $ find|grep "i386\.rpm"|while read i;do diff -qr "$i" "../i386/$i";done > $ find|grep "i386\.rpm"|while read i;do cat "$i";done|wc -c > 440745010 > $ find|grep "noarch\.rpm"|while read i;do diff -qr "$i" "../i386/$i";done > $ find|grep "noarch\.rpm"|while read i;do cat "$i";done|wc -c > 426816227 > > $ pwd > /opt/mirrors/centos/4.2/updates/x86_64 > $ find|grep "i386\.rpm"|while read i;do diff -qr "$i" "../i386/$i";done > $ find|grep "i386\.rpm"|while read i;do cat "$i";done|wc -c > 12819616 > $ find|grep "noarch\.rpm"|while read i;do diff -qr "$i" "../i386/$i";done > diff: ../i386/./RPMS/createrepo-0.4.3-1.noarch.rpm: No such file or directory > $ find|grep "noarch\.rpm"|while read i;do cat "$i";done|wc -c > 2164495 > > $ ls RPMS/createrepo-0.4.3-1.noarch.rpm -al > -rw-rw-r-- 2 maze maze 18284 Sep 5 13:59 RPMS/createrepo-0.4.3-1.noarch.rpm > > That seems to me to be a 880 MB mirror space savings to be made there... > Considering the i386/x86_64 mirror takes up 7.7GB (without iso's) that's > quite a bit... > > I also imagine the noarch files are shared with most of the other > architectures... so I'd assume another 400MB per every next arch can be > saved...One thing to please remember is that we develop these files from separate locations on separate machines, so they have to be stand alone on those machines initially ... we then combine them together on the mirror and run hardlink++. That SHOULD hardlink all the files that are the same.> > f) Why aren't jigdo files available on the site? They'd really come in > useful, especially in the situation I had where I already had a complete > mirror of all the files, but I still had to bittorrent the CD/DVD's even > though I had 99% of the required data on disk!I don't know how to do jigdo files ... however, I am willing to learn. Fedora and Redhat don't, to my knowledge, create or distribute jigdo files ... so this is not something that we would normally do. There are lots of things that we don't do ... maybe we need 48 hour days :) I am willing to learn what jigdo is all about ... but for now I am totally ignorant. -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 189 bytes Desc: This is a digitally signed message part URL: <http://lists.centos.org/pipermail/centos/attachments/20060103/10f5d908/attachment.sig>