Bryan J. Smith
2005-Sep-09 15:20 UTC
[CentOS] [OT] Concept: RepoDELTA (simple createrepo hack)
I'm not on the YUM lists, so I'll post this here in the hope it will make someone happy. I'll forward the concept to Seth for consideration (if he doesn't see it here). This is a simple "createrepo" hack that would require corresponding support in the "yum" binary as well. I'm not sure this would work, but it's just an idea I figure I might as well through out. PROBLEM Users want to be able to access the "state" of a YUM repository at any arbitrary point in time or by a known tag. REQUIREMENTS In order for this to work, there are 3 requirements: 1. Appending-only Repository No package deletions, packages are only added to the YUM repository, never removed (at least not without careful considerations -- i.e., not recommended). 2. Multiple Repodata Instances/Tracking For those who actually know how YUM works at the HTTP-accessed repository, it is the "repodata" directory that holds all meta-data/package list information. In order for any resolution to occur on any arbitrary date/tag, there must be a way to control/delta/host multiple Repodata Instances and track. 3. Resolution There must be a new set of mechanisms for resolving what Repodata Instance to use, given an option in the YUM client. Either a radical change must occur in the way YUM operates (i.e., repositories are accessed via HTTP), or the client must take on this additional burden. SOLUTION 1. Doesn't change regardless of solution -- unless, of course, the server wants the additional overhead of binary deltas between RPMs. No thanks, we'll just keep all RPM package releases. 2/3. An evolutionary approach is used, no fancy version control system, just a flat directory system all still accessed via HTTP. This is how it works. New Subdirectory: ./repodelta A. Tree organization Repodelta is a tree of subdirectories whose names are the string of absolute POSIX seconds assuming UTC -- i.e., the same output as "date -u +%s", e.g., ./repodelta/1126278362 B. Tree index Since HTTP does not lend itself to file management (at least not without something like WebDAV on both the server and client), there will be an index file of the tree. This should be a simple file, something like "./repodelta/releases.txt" or similar. It should be a flat text file, just the `date -u +%s" format on each line, although an MD5 or other checksum could be attached to the end or a separate file for verification of its integrity. C. New createrepo option: --delta The createrepo should have a new option called "--delta". Instead of generating new meta-data files in the "./repodata" subdirectory, it creates a new "./repodelta/`date -u +%s`" subdirectory with new meta-data files, then symlinks ./repodata/ to it. It also re-reads the "./repodelta/" directory, looks for subdirectories ( [ -d ] "test" in most scripting languages) with valid meta-data indicies (this would be more subjective/involved), and regenerates the "./repodelta/releases.txt" file with an date index to those releases. Again, we do this since standard HTTP does not define file operations such as reading a directory list. D. Tags: Symlinks Creating tags is as simple as symlinking a new directory against one of the `date -u +%s` format directories. The createrepo "--delta" option will look for symlinks in the under "./repodelta", and add the tag in the "./repodelta/releases.txt" after the de-referenced directory name (whitespace delimited). E.g., given the directory listing: ./repodelta/1126278362 ./repodelta/4.1 -> 1126278362 ./repodelta/current -> 1126278362 The line in ./repodelta/releases.txt would be: 1126278362 4.1 current It probably wouldn't hurt for a reserved "HEAD" tag to be automagically generated whenever createrepo --delta is run for the new directory. E. YUM client options New YUM clients would have to be written with new options and resolution logic -- date and tag options. Resolution is straight-forward for tags, although a decision whether to default to HEAD or give an error if a tag doesn't exist would have to be considered. Date resolution is almost as straight-forward -- given a date/time (in a variety of formats -- absolute seconds, traditional format, reverse offset from current, etc...), the YUM client will look for the closest date that is "no later" than the one given. Kinda like the Price-is-Right, the closest without going over (the given date/time). F. Backward compatibility, both client and server Because the ./repodata/ is just a symlink, legacy YUM clients work without modification. And in the case where someone runs a non-repodelta verison of createrepo, only the last repo is overwritten. NOTE: An option to avoid this might be to have ./repodata not be a symlink to a subdirectory in ./repodelta/, but a copy of of the appropriate subdirectory. Another option would be to leave the symlink, but it always points to a reserved ./repodelta/HEAD/ directory that is not a symlink. That way you always know the ./repodelta/`date -u +%s` is always true to that date, and the only consideration is if ./repodelta/HEAD/ is not the same as the latest ./repodelta/`date -u +%s`. In fact, that might be most ideal, HEAD is _never_ a tag/symlink, but its own, real subdirectory. EXAMPLE: An example directory structure might be: ./repodata ./repodelta/1125372135 ./repodelta/1126278362 ./repodelta/4.0 -> 1125372135 ./repodelta/os -> 1125372135 ./repodelta/4.1 -> 1126278362 ./repodelta/HEAD Where ./repodelta/releases.txt contains: HEAD 1126278362 4.1 1125372135 4.0 os HEAD may or may not be the same as 1126278362. Again, refereincing "F" above, this is to prevent 1126278362 from being changed in case someone runs "createrepo" without the "--delta" option. But if someone did when 1126278362 was created, it will copy its contents into HEAD. And if no one has run createrepo since, the contents will match. Again, this "hack" is probably very easy to implement on the server side. In fact, it could be run daily or on another, regular period. If new files have been added (Q: by date? or my comparing to the previous meta-data?), it will revision a new ./repodelta/`date -u +%s` subdirectory and the other operations. If not, there is no sense in creating a second, exact copy from the previous run. -- Bryan J. Smith | Sent from Yahoo Mail mailto:b.j.smith at ieee.org | (please excuse any http://thebs413.blogspot.com/ | missing headers)